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This article considers the problem of multiple hypothesis testing using t-tests. The observed data 
are assumed to be independently generated conditional on an underlying and unknown two-state 
hidden model. We propose an asymptotically valid data-driven procedure to find critical values 
for rejection regions controlling the fc- family wise error rate (fc-FWER), false discovery rate 
(FDR) and the tail probability of false discovery proportion (FDTP) by using one-sample and 
two-sample t-statistics. We only require a finite fourth moment plus some very general conditions 
on the mean and variance of the population by virtue of the moderate deviations properties of 
t-statistics. A new consistent estimator for the proportion of alternative hypotheses is developed. 
Simulation studies support our theoretical results and demonstrate that the power of a multiple 
testing procedure can be substantially improved by using critical values directly, as opposed to 
the conventional p-value approach. Our method is applied in an analysis of the microarray data 
from a leukemia cancer study that involves testing a large number of hypotheses simultaneously. 
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1. Introduction 

Among the many challenges raised by the analysis of large data sets is the problem 
of multiple testing. Examples include functional magnetic resonance imaging, source 
detection in astronomy and microarray analysis in genetics and molecular biology. It 
is now common practice to simultaneously measure thousands of variables or features 
in a variety of biological studies. Many of these high-dimensional biological studies are 
aimed at identifying features showing a biological signal of interest, usually through the 
application of large-scale significance testing. The possible outcomes are summarized in 
Table 1. 
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Table 1. Outcomes when testing m hypothe- 
ses 



Hypothesis 


Accept 


Reject 


Total 


Null true 


U 


V 


mo 


Alternative true 


F 


s 


mi 


Total 


W 


R 


m 



Traditional methods that provide strong control of the familywise error rate (FWER = 
P{y > 1)) often have low power and can be unduly conservative in many applications. 
One way around this is to increase the number k of false rejections one is willing to 
tolerate. This results in a relaxed version of FWER, A:- FWER = P{V > k). 

Benjamini and Hochberg [1] (hereafter referred to as "BH") pioneered an alternative. 
Define the false discovery proportion (FDP) to be the number of false rejections divided 
by the number of rejections (FDP = V/ {RU 1)). The only effect of the RU 1 in the denom- 
inator is that the ratio V/ R is set to zero when R = 0. Without loss of generality, we treat 
FDP = V/R and define the false discovery tail probability FDTP = P{V > aR), where 
a is pre-spccified, based on the application. Several papers have developed procedures 
for FDTP control. We shall not attempt a complete review here, but mention the follow- 
ing: van der Laan, Dudoit and Pollard [26] proposed an augmentation-based procedure, 
Lchmann and Romano [18] derived a step-down procedure and Genoves and Wasserman 
[13] suggested an inversion-based procedure, which is equivalent to the procedure of [26] 
under mild conditions [13]. 

The false discovery rate (FDR) is the expected FDP. BH provided a distribution-free, 
finite-sample method for choosing a p-value threshold that guarantees that the FDR is 
less than a target level 7. Since this publication, there has been a considerable amount of 
research on both the theory and application of FDR control. Benjamini and Hochberg [2] 
and Benjamini and Yekutieli [3] extended the BH method to a class of dependent tests. 
A Bayesian mixture model approach to obtain multiple testing procedures controlling 
the FDR is considered in [11. 21-24]. Wu [29] considered the conditional dependence 
model under the assumption of Donsker properties of the indicator function of the true 
state for each hypothesis and derived asymptotic properties of false discovery proportions 
and numbers of rejected hypotheses. A systematic study of multiple testing procedures 
is given in the book [9]. Other related work can be found in [6, 7]. 

One challenge in multiple hypothesis testing is that many procedures depend on the 
proportion of null hypotheses, which is not known in reality. Estimating this proportion 
has long been known as a diSicult problem. There have been some interesting devel- 
opments recently, for example, the approach of [20] (see also [11, 13, 17, 19]). Roughly 
speaking, these approaches are only successful under a condition which [13] calls the 
"purity" condition. Unfortunately, the purity condition depends on p-valucs and is hard 
to check in practice. 

The general framework for fc-FWER, FDTP, FDR control and the estimation of the 
proportion of alternative hypotheses is based on p- values which are assumed to be known 
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in advance or can be accurately approximated. However, the assumption that p- values are 
always available is not realistic. In some special settings, approximate p- values have been 
shown to be asymptotically equivalent to exact p- values for controlling FDR [12, 16]. 
However, these approximations are only helpful in certain simultaneous error control 
settings and are not universally applicable. Moreover, if the p- values arc not reliable, any 
procedures derived later are problematic. 

This motivates us to propose a method to find critical values directly for rejection 
regions to control fc-FWER, FDTP and FDR by using one-sample and two-sample t- 
statistics. The advantage of using t-tests is that they require minimum conditions on the 
population, only existence of the fourth moment, which is relatively easily satisfied by 
most statistical distributions, rather than other stringent conditions such as the existence 
of the moment generating function. In addition, wc approximate tail probabilities of 
both null and alternative hypotheses accurately, rather than p-value approaches that 
only consider the case under null hypotheses. Thus, a better ranking of hypotheses is 
obtained. Furthermore, we propose a consistent estimate of the proportion of alternative 
hypotheses which only depends on test statistics. As long as the asymptotic distribution 
of the test statistic is known under the null hypothesis, we can apply our method to 
estimate this proportion, resulting in more precise cut-offs. 

The BH procedure controls the FDR conservatively at 7ro7, where ttq is the proportion 
of null hypotheses and 7 is the targeted significance level. If ttq is much smaller than 
1, then the statistical power is greatly compromised. The power we use in this paper 
is NDR = E[S\/mi, as defined in [8]. In the situation that t-statistics can be used, our 
procedure gives a better approximation and more accurate critical values can be obtained 
by plugging in the estimate of ttq. The validity of our approach is guaranteed by empirical 
process methods and recent theoretical advances on self-normalized moderate deviations, 
in combination with Berry-Esseen-type bounds for central and non-central ^-statistics. 

To illustrate, we simulate a Markov chain, as in [25], of Bernoulli variables {Hi),i = 
1, . . . , 5000, to indicate the true state of each hypothesis test {Hi = 1 if the alternative 
is true; Hi = if the null is true). Conditional on the indicator, observations Xij^i = 
1, . . . , 5000, j = 1, . . . , 80, arc generated according to the model Xij = fj,i + eij . The one- 
sample t-statistic is used to perform simultaneous hypothesis testing. Figure 1 shows the 
plot of 10 000 MCMC results of the realized and nominal FDR control based on the BH 
method for different control levels. From this plot, we can sec that as the control level 
increases, the BH procedure becomes more and more conservative. For instance, the FDR 
actually obtained is 0.167 when the nominal level is set at 0.2, refiecting a significant loss 
in power. 

The three methods of multiple testing control we utilize are fc-FWER, FDTP and 
FDR. The criterion for using fc-FWER is, asymptotically. 



Since we only apply our method when there are discoveries (R > 0), we need the FDTP, 
with a given proportion < a < 1 and significance level < 7 < 1 , to satisfy, asymptoti- 
cally. 



P{V>k)<j. 



(1.1) 



P{V>aR)<-f. 



(1.2) 




Similarly, the criterion for using FDR is, asymptotically, 

FDR<-i or [ P{V>aR)da<j. (1.3) 
Jo 

The main contributions of this paper are as follows: (1) Moderate deviation results 
which only require the finiteness of fourth moment, from which the statistic is computed 
in probability theory, are applied in multiple testing. Thus, the applicability of this proce- 
dure is dramatically expanded: it can deal with non-normal populations and even highly 
skewed populations. (2) The critical values for rejection regions are computed directly, 
which circumvents the intermediate p-value step. (3) An asymptotically consistent es- 
timation of the proportion of alternative hypotheses is developed for multiple testing 
procedures under very general conditions. 

The remainder of the paper is organized as follows. In Section 2, we present the basic 
data structure, our goals, the procedures and theoretical results for the one-sample i-test. 
Two-sample t-test results are discussed in Section 3. Section 4 is devoted to numerical 
investigations using simulation and Section 5 applies our procedure to detect significantly 
expressed genes in a microarray study of leukemia cancer. Some concluding remarks and 
a discussion are given in Section 6. Proofs of results from Sections 2 and 3 are given in 
the Appendix. 
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2. One-sample t-test 

In this section, we first introduce the basic framework for simultaneous hypothesis test- 
ing, followed by our main results. Estimation of the unknown proportion of alternative 
hypotheses tti is presented next. We conclude the section by presenting theoretical results 
for the special case of completely independent observations. This special setting is the 
basis for the more general main results and is also of independent interest since fairly 
precise rates of convergence can be obtained. 

2.1. Basic framework 

As a specific application of multiple hypothesis testing in very high dimensions, we use 
gene expression microarray data. At the level of single genes, researchers seek to establish 
whether each gene in isolation behaves differently in a control versus a treatment situa- 
tion. If the transcripts are pairwise under two conditions, then we can use a one-sample 
t-statistic to test for differential expression. 
The mathematical model is 

Xij = fii + eij, 1 < j < n,l < i < m. (2-1) 

It should be noted that the following discussion is under this model and does not hold 
in general. Here, Xij represents the expression level in the ith gene and jth array. Since 
the subjects are independent, for each i, ea, ei2, • • • 7 ^in are independent random variables 
with mean zero and variance af . The null hypothesis is /Ui = and the alternative hypoth- 
esis is fii 0. For the relationship between different genes, we propose the conditional 
independence model, as follows. Let (Hi) be a {0, l}-valued stationary process and, given 
(-^i)i=i: -^ij^ j = 1, . . . , w, are independently generated. The dependence is imposed on 
the hypothesis {Hi), where Hi = ii the null hypothesis is true and Hi = 1 ii the alter- 
native is true. From Table 1, we can see that X^i^i — ™i ^^'^ Si=i(l ^ ^i) — ^o- It 
is assumed that [Hi)^^ satisfy a strong law of large numbers: 

^ m 

— Vf, ^TTi e (0,1) a.s. (2.2) 

TO 

1=1 

This condition is satisfied in a variety of scenarios, for example, the independent case, 
Markov models and stationary models. Consider the one-sample i-statistic 

Ti = y/nXi/Si, 



where 
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If we use t as a cut-ofF, then the number of rejected hypotheses and the number of 
false discoveries are, respectively, 

m m 

^ = ^ l{|T.|>t}, ^ ^'^i^- H,)l{\Ti\>t}- (2-3) 

i=l i=l 

Under the null hypothesis, it is well known that Ti follows a Student t-distribution with 
n — 1 degrees of freedom if the sample is from a normal distribution. Asymptotic con- 
vergence to a standard normal distribution holds when the population is completely un- 
known, provided that it has a finite fourth moment under the null hypothesis. Moreover, 
under the alternative hypothesis, Ti can also be approximated by a normal distribution, 
but with a shift in location. We will show that 

F^{t) := P{\T,\ > t\H, = 0) = Pi\Z\ > t)(l + 0(1)) = 2l>(t)(l + o(l)), (2.4) 
Flit) := Pi\T,\ > t\H, = 1) = E[P(|Z + xAI^i^MI > i|M.,a,)](l +o(l)), (2.5) 

uniformly for t = o(n^/^) under some regularity conditions, where Z denotes the standard 
normal random variable, $ is the tail probability of the standard normal distribution and 
the critical values t„.m that control the FDTP and FDR asymptotically at prescribed 
level 7 are bounded. These assumptions are fairly realistic in practice. We do not require 
the critical value for fc-FWER to be bounded. Although we do not typically know mi, 
Foit) or Fi{t) in practice, we need the following theorem - the proof of which is given 
in the Appendix - as the first step. Wc will shortly extend this result, in Theorem 2.2 
below, to permit estimation of the unknown quantities. 

Theorem 2.1. Assume that E{eij\iJ,i,(Tf) ~ 0, Var(eij j/i;, cr,^) crl, limsupE'e^^ < oo, 
< TTi < 1 — a and (2.2) is satisfied. Also, assume that there exist eo > and cq > such 
that 

P{\^/^^l^/a,\>eQ\H,^l)>c^ V?i>l. (2.6) 

Let 

^i.^{t) = amiFiit) - (1 - a)moFo(i) (2.7) 

and 

afjt) = a^miFi{t){l - Fi(t)) + (1 - afmQFo{t){l - Fo(t)). (2.8) 
(i) // t{f*P^ is chosen such that 

ti'X = inf{^ ■■ t^m{t)/'J,n{t) > z^}, (2.9) 
where is the jth quintile of the standard normal distribution, then 

lim P{FDP>a)= lim P{V>aR)<-f (2.10) 



holds. 
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(ii) // t{f m '■^ chosen such that 

tl^r ^ -A^ . ^^m) < 1 (2.11) 

then 

lim FDR= lim E{V/R)<j (2.12) 

(iii) // f^-^^^^ is chosen such that 

t^J''''' - inf{t : Pm >k)< 7}, (2.13) 
where •q{t) ^ Poisson(0(t)) anrf 

0(t)=mo-Fo(<), 

then 

lim k-FWER^ lim P(V" > fc) < 7 (2.14) 

m— foo m— foo 



Remark 2.1. In the next section, we use a Gaussian approximation for Fq [t) and Fi [t) 
for both FDTP and FDR, for which the critical values are shown to be bounded. In 
this case, m can be arbitrarily large, while the critical value remains bounded. Due to 
sparsity, we use a Poisson approximation for fc-FWER, for which the critical value is no 
longer bounded as m — >■ cx), and we require log to = o{ri}/^). 



2.2. Main results 

Note that in Theorem 2.1, there are an unknown parameter toi and unknown functions 
i^o(^) a-nd Fi{t) involved in fim{t) and a„i(t). For practical settings, we need to estimate 
these quantities. We will begin by assuming that we have a strongly consistent estimate 
of TTi and will then provide one such estimate in the next section. Given "H, note that 
p{t) = P{\T,\ >t) = {l- H,)P{\T,\ > t\H, = 0) + H,P{\T,\ > t\H, = 1) can be estimated 
from the empirical distribution Pm{t) of {jTil}, where 

1 

^™W = -E^{mi>t}' (2-15) 

i=l 

and that P{\T,\ > t\Hi = 0) is close to P{\Z\ > t) when n is large, by (2.4). The next 
theorem, proved in the Appendix, provides a consistent estimate of the critical value 
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,~MP=i,fJ,^V^^^^ ,^ (2.18) 



Theorem 2.2. Let 

l^m{t) = aprnit) - 2(1 - (2.16) 

and 

Tl{t) = a\pm{t) - 2(1 - ^i)$(i)) (l - Y^{Pm{t) - 2(1 - 7ri)$(i))) 

(2.17) 

+ 2(l-a)2(l-^i)$(t)(l-2$(t)), 

where tti is a strongly consistent estimate of tti . Assume that the conditions of Theo- 
rem 2.1 are satisfied. 

(i) // t{f m chosen such that 

.(0 
then 

e-Cl=°(l) (2-19) 

(ii) // t^n'm chosen such that 

ffdr ■ A, 2(l-^l)$(0 \ 

^n.m^mfV: ^-TTn <7h (2.20 

then 

(2-21) 

(iii) // i^^^^^ is chosen such that 

it™ = inf{t : P(C(i) > fc)} < 7, (2.22) 
w/iere C(t) ^ Poisson(^(t)) anii 

^(t) = 2m(l-7ri)$(t), 
i/ien, as long as logm = o(n^/'^), u;e have 

(2-23) 

Remark 2.2. This theorem deals with the general dependence case, where {H.i)i^ is 
assumed to follow a two-state hidden model and the data are generated independently 
conditional on (iJ;)™. The proof is mainly based on the independence case, which we 
present in Section 2.4 below, plus a conditioning argument. 
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2.3. Estimating tti 

In the previous section, we assumed that tti was a consistent estimator of tti. Wc now 
develop one such estimator. By the two-group nature of multiple testing, the test statistic 
is essentially a mixture of null and alternative hypotheses with proportion as a param- 
eter. By virtue of moderate deviations, the distribution of t-statistics can be accurately 
approximated under both null and alternative hypotheses. However, for the alternative 
approximation, an unknown mean and variance are involved. So, we think of a func- 
tional transformation of the ^-statistics which has a ceiling at 1 to first get a conser- 
vative estimate of tt which is consistent under certain conditions. Let c > and define 
gc{x) = min(|x|, c)/c. It is easy to see that gc is a decreasing function of c, bounded by 
1, and that the derivative ^ is bounded by 1/c. Hence, the function class {gc} indexed 
by c is a Donsker class and thus also Glivenko-Cantelli. Let 

^ m 

gc = -Y.9c{Ti). (2.24) 

TO ^ — ' 

1=1 



Theorem 2.3. We have 

gc-EjgcjZ)) 
TTi > hm sup a.s. 

rn^oo,ri-i.oo c>0 i — ^/(^ci^jj 

//, in addition, we assume that 

Y^/ij /fTi — > oo for all i with Hi = l,i ^ 1, . . . ,m, a.s.asn—^oo, (2.25) 

then 

gc-E{gc{Z)) 
7ri= hm sup- a.s., 

m-i.oo,n-i.oo j,^Q L — iL [gc[Z ) ) 



whe 



i?(ffe(^))-^(l-e-^'/^) + 2$(c). 



Proof. We can write 



9c — sr^rn 1 "T 



^=l 1{H.=0} "I l^^^^ l{ff, = l} 



m m 



Let % = {Hi,l < i < to}. Conditional on Ti^l < i < m, are independent random 
variables. We consider I first. Let 

, , 1 9c{Tmnm=,} TZi E{gc{TmMH.=o} 
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let E be the infinite sequence l{ffi=o}, l{_ff2=o}: ■ • • ^-^d l^t F be the event that 
St=i l{-Hi=o} ^ oo as m — >■ cx3. By the assumption (2.2), we know that P{F) = 1. Thus, 



P{ hm sup|A™(c)| =o) =-B[pf hm sup |A„(c)| = Ol-B 

Vm->oo j,>Q / L \m->oo j,>q 



= 1, 



where the second equahty follows from the fact that, conditional on i5, the terms in the 
sum arc i.i.d. and thus the standard Glivenko-Cantelli theorem applies. Arguing similarly, 
based on conditioning on the sequence 1{Hi=i}i l{_f/,=i}7 • ■ • > we can also establish that 



sup 

oo 



1 



•0 



Now, note that // < 1. Thus, since m^/m — > (1 — tti) a.s. and mi/m — tti a.s., we have 
that when m — >■ oo, n — >■ oo, 

9c<{l-TTi)E{gc{Z))+TTi a.s. 

= i?(5c(^)) + (l-£;(3c(^)))7ri. 

We now have the following lower bound for tti ; 



TTi > lim sup 



oo,n-)-oo 1 - E{gc{Z)) 



a.s. 



(2.26) 



Define 



1 ™ 

Ai :=(l-7ri)i?(5,(Z))+^i — Vi?(.gc(r,)|?^)l{H,=i}, 

mi ^-^ ' 



{l-7ri)E{g,{Z))+Tri 



E™i i{ff.=i} 

Letting n — > oo, we have sup^^g | Ai — A2I — J' a.s. Also, 
A2^{l-TTi)E{g,{Z)) 

> (l-TTi)E{gc(Z)) + ni , 

> (l-^i)S(.ge(^)) + 7ri 
-i?(g,(Z)) + ^i(l-£;(ge(Z))). 

Note that 

sup l^c — Ai I — > a.s. as m — > 00, ?! — 00. 



t-tests in very high dimensions 



357 



Therefore, 

(jc > E{gc{Z)) + 7ri(l — E{gc{Z))) a.s. as to ^ oo,n — )• oo. 
Thus, wc obtain 

-i< lim sup ^;'g^-g)) a.s. (2.27) 

□ 

As a consequence of this theorem, we propose the following estimate of tti : 

. „..„ gc--g(gc(^)) 

TTi .= sup , . , (2.28) 

c>o 1 - E[gc{Z)) 



i?(5c(^)) = ^(l-e-^'/2) + 2$(c). 



where 



Remark 2.3. If we use tti, as given in (2.28), then Theorem 2.2 yields a fully automated 
procedure to carry out multiple hypothesis testing in very high dimensions in practical 
data settings. 



2.4. Consistency and rate of convergence under independence 

In order to prove the main results in the general, possibly dependent, t-test setting, 
we need results under the assumption of independence between i-tests. Specifically, we 
assume in this section that (T^, Hi),i ^ 1, . . . ,m are independent, identically distributed 
random variables with tti = P{Ti = 1). This independence assumption can also yield 
stronger results than the more general setting and is of independent interest. 

The next theorem, proved in the Appendix, provides a strong consistent estimate of 
the critical value tn^m, as well as its rate of convergence. 

Theorem 2.4. Let 

Vmit)=ap,nit)-2{l-7Ti)^(t) (2.29) 

and 

rlit) = a'p,„(i)(l -p™(t)) +4a(l - ^i)p„(t)$(t) 
+ 2(1 - 7ri)l>(t)(l -2a- 2(1 - 7ri)$(f)). 

Assume the conditions of Theorem 2.1 with (2.2) replaced by the assumption that 
{Ti,Hi), i = 1, . . . , m, are i.i.d. and tti — P{Ti — 1) . Let J = {i: Hi — 1} he the set that 
contains the indices of alternative hypotheses. Also, assume that fii,ai are i.i.d. for 
i€j. 
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(i) // t{f m chosen such that 

^~e5;.-inf{i:^^|j^>..}, (2.30) 

then 

\if:^%-tf:iX\^0{n-'/^+m-^'\\og\ogmf'^) a.s. (2.31) 

and 

\inX - Cnl = 0(n-i/2 + rn-^/^) ^ probability. (2.32) 

Here, t{f^P^ is the critical value defined in (A. 26). 

(ii) If t^f"^ is chosen such that 

e = ..4.:^^i^#fi<.|, (..33) 

I Pni{t) J 

then 

\ii%-ti'X\-0{n-^/'+m-^/^\og\ogm)^/') a.s. (2.34) 

anrf 

\i{tm - Cnl = 0(n-i/2 + „j-i/2) j„ probability. (2.35) 

Here, t^'^^ is the critical value defined in (A. 28). 

(iii) // is chosen such that 

it™ - inf{t : Pm > k)] < 7, (2.36) 
where (^{t) ^Poisson{9{t)) and 

^(t) = 2m(l-7ri)$(t), 

lit™ - i:;:„r^^i = o((iogm)-v2) (2.37) 

77ere is i/ie critical value defined in (A. 30). 

Remark 2.4- If a = 7 in Theorem 2.4, then it is not difficult to see that t{f*P^ — t{f'^ = 
0(m"i/2) a.s. Therefore, (2.31) and (2.32) remain vahd with i{fX replaced by iff;"^. This 
shows that controlling FDTP is asymptotically equivalent to controlling FDR. This is 
also true in the more general dependence case. Thus, we will focus primarily on FDR in 
our numerical studies. 



t-tests in very high dimensions 



359 



Remark 2.5. Note that tti is assumed to be known in order to get a precise rate of 
convergence for FDTP and FDR. If tti is estimated with rate of convergence r„ , then the 
correct convergence rate for the "in probabihty" resuh for FDR and FDTP would involve 
an additional term 0(r„) added in (2.32) and (2.35). It is unclear what the correction 
would be for the almost sure rate in (2.31) and (2.34). These corrections are beyond the 
scope of this paper and will not be pursued further here. Note that the rate of tti is not 
needed in the main results presented in Sections 2.1-2.3. 

3. Two-sample t-test 

In this section, the results of the previous section are extended to the two-sample t- 
test setting. The estimator of the unknown parameter tti remains the same as in the 
one-sample case, but with Ti in (2.24) being the two-sample, rather than one-sample, 
t-statistic. Theoretical results for the rates of convergence under independence are also 
presented, as in the previous section. 

3.1. Basic set-up and results 

When two groups, such as a control and an experimental group, are independent, which 
we assume here, a natural statistic to use is the two-sample i-statistic. As far as possible, 
we adopt the same notation as used in the one-sample case, and we assume that (2.2) 
holds. Wc observe the random variables 

X^j=^.^+€^j, I < j < Tli , 1 < i < m, Y,j = + UJ,j , I < j < n2,l < i < m, 

with the index i denoting the ith gene, j indicating the jth array, representing the 
mean effect for the ith gene from the first group and Vi representing the mean effect 
for the ith gene from the second group. The sampling processes for the two groups are 
assumed to be independent of each other. The sample sizes ni and n2 are assumed to 
be of the same order, that is, < 61 < ni/n2 < &2 < 00. We will also assume that for 
each i, e^i, 6^2, • . • , eim are independent random variables with mean zero and variance 
af; u!ii,uJi2, ■ ■ ■ ,ujin2 are independent random variables with mean zero and variance t^. 
The null hypothesis is ^li — Vi, the alternative hypothesis is fii 7^ Vi and the dependence 
is assumed to be generated in the same manner as the dependence in the one-sample 
setting. Consider the two-sample t-statistic 




VSh/n,+Syn2' 



where 




360 



H. Cao and M.R. Kosorok 



o2 _ 
'-'li — 



1 JlL 



Then 



Si = 



712 — 



1 JUl. 



{|T*|>t}- 



(3.1) 



The two-sample ^-statistic is one of the most commonly used statistics to construct con- 
fidence intervals and carry out hypothesis testing for the difference between two means. 
There are several premises underlying the use of two-sample t-tests. It is assumed that 
the data have been derived from populations with normal distributions. Based on the 
fact that Sii <Ti, 821 — >■ a.s., with moderate violation of the assumption, statisticians 
quite often recommend using the two-sample t-test, provided the samples are not too 
small and the samples are of equal or nearly equal size. When the populations are not 
normally distributed, it is a consequence of the central limit theorem that two-sample 
t-tests remain valid. A more refined confirmation of this validity under non-normality 
based on moderate deviations is shown in [4]. Furthermore, under the alternative hy- 
pothesis, the asymptotic results still hold, but with a shift in location similar to the 
one-sample case under certain conditions, that is, 

P{\T:\>t\m = Q) = P{\Z\>t){\+o{l)), 

IJ-i - J^, 



p{\T:\>t\H,^i)^p 



z 



-^rii ,712 



>t Ul+oil)), 



uniformly in t = o{n^^^), where B^_^ ^af/ni +Tf/n2. Under the assumption of (2.2), 
asymptotic critical values to control FDTP, FDR and fc-FWER are very similar to the 
one-sample t-test case with the one-sample t-statistic Ti replaced by the two-sample t- 
statistic T* . The following theorem, proved in the Appendix, is analogous to Theorem 2.1 
and is a necessary first step. 

Theorem 3.1. Assume that E{eij\fii, af) = 0, E{ujij\i^i, rf) = 0, Var(ey (t|) = erf, 
Yai{ijJij\vi,Tf) = rf , limsupi?efj < 00, lim sup E'Tj^^^- < 00, < tti < 1 — a and that (2.2) 
is satisfied. Assume that there exist eo and cq such that 

tJ-i - 



Br, 



>€a\Hi = l > Co for all 711,712. 



(3.2) 



The conclusions of Theorem 2.1 then hold with the one-sample t-statistic Ti replaced by 
the two-sample t-statistic T* . 



3.2. Main results 



The unknown parameter 77ii and functions -Fb(0 Fi{t) in Theorem 3.1 are estimated 
similarly as in the one-sample case with the one-sample i-statistic replaced by its two- 
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sample counterpart. The following theorem, the proof of which is given in the Appendix, 
gives our main results for two-sample i-tests. 

Theorem 3.2. Assume that the conditions in Theorem 3.1 are satisfied. Replace the 
one-sample t-statistic Ti by the two-sample t-statistic T* in Theorem 2.2. Let tti he a 
strong consistent estimate o/tti, as in (2.28), using the two-sample t-statistic T* . 

(i) // t{f ^ is chosen such that 



^'S-inn^:^^^;|^>..^ (3.3) 



then 

WnX-€X\=o{l) a.s. (3.4) 

(ii) // t^n^^ chosen such that 

E. = i4:^^i^^<7| (3.5) 

then 

Kn^-€:J=o{l) a.s. (3.6) 

(iii) // t^~J^^^^ is chosen such that 

it™="^f{^^^(CW>fc)}<7, (3.7) 
where C{t) ^Poisson{d{t)) and 

9{t)^2m{l-TTi)^{t), 
then, provided logm = o(n^/'^), we have 

|it™-t,';™Ho(l) a.s. (3.8) 

Remark 3.1. tti can be estimated via (2.28) by using two-sample t-statistics. Theo- 
rem 2.3 is applicable in the two-sample setting, as well as in the one-sample case, and 
consistency follows. Thus, Theorem 3.2 gives a fully automated procedure to conduct 
multiple hypothesis testing using two-sample t-statistics after we plug in the tti given in 
(2.28). 



3.3. Consistency and rate of convergence under independence 

Results for the independence setting are needed for the proofs of the main results, as was 
the case for one-sample t-tests. We can, once again, obtain more precise estimation com- 
pared with the general dependence case. The following theorem, proved in the Appendix, 
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gives us conditions and conclusions using two-sample t-statistics for controlling FDTP 
and FDR asymptotically, as well as rates of convergence under the assumption that 
{Ti,Hi) are independent of each other for 1 <i<m. Assume that tti is the proportion 
of the alternative hypotheses among m hypothesis tests, that is, tti = P{Hi = 1). Let 
J = {i:H, = l}. 

Theorem 3.3. Assume the conditions of Theorem 3.1 are satisfied. Rather than (2.2), 
we assume that {Ti,Hi) are independent and identically distributed. In addition, tti = 
P(Ti = 1) and fii,ai are i.i.d. for i G J7. Let 

p{t)^P{\T*\>t), (3.9) 
aiit) = ap{t) - (1 - ^i)P(|rr| > t\Hi = 0), (3.10) 
bjit) = a^pit)il - p{t)) + 2a(l - 7ri)pit)P{\T*\ > t\H, = 0) 

+ (1 - 7ri)F(|Ti*| > t\Hi = 0)(1 - 2a - (1 - 7ri)P(|r*| > t\Hi = 0)), 

^ 771 

p^(^')^-^h\T'\>t}, (3.11) 

i=l 



and 



Vm{t) = apra{t) - 2(1 - 7ri)$(t), (3.12) 
T^(i) = a^Pra{t){l~Pra{t))+Aa{l - '^i)p^(t)^t) 

+ 2(1 - 7ri)#(t)(l - 2a - 2(1 - Tii)^{t)). 

The conclusions of Theorem 2.4 then hold with the one-sample t-statistics Ti replaced by 
the two-sample t-statistics T* . 

Remark 3.2. In the above sections, we developed our theorems based on two-sided 
tests. The results for the case of one-sided tests are very similar, but with the rejection 
region {Ti > t} for each test. We omit the details. 



4. Numerical studies 

In this section, we present numerical studies based on simulated data and compare the 
power of our approach with [1] (BH) and [23] (ST) approaches using one-sample t- 
statistics. The results for using two-sample i-statistics are very similar and so we omit 
the details here. 

4.1. Simulation study 1 

We investigate the results for the i.i.d. case first. Recall the model 
Xij ^ fii-\-eij, 1 <i<m,l <j < n. 
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We set the signal using ^ Unif {0.5, 1) or fii ^ C/ni/(— 1, — 0.5), which is of the correct 
order for the standardized error term. Here, the number of hypothesis tests is m = 
10 000, which is the same for all following simulation studies, unless otherwise noted. 
The proportion of alternatives tti = 0.2 and the error term t{A) are used just to illustrate 
the asymptotic results. We vary the number of arrays n from 20 to 50 to 300 to evaluate 
our asymptotic approximation. Empirical distributions of FDTP, FDR and fc-FWER 
based on 100 000 repetitions are treated as the gold standard since they have almost 
negligible Monte Carlo error. The samples arc generated to evaluate our proposed method 
based on asymptotic theory. Specifically, for each sample, we calculate the sample paths 
of the following quantities indexed by t: y/rnvm{t) / Tm{t) for studying FDTP, 2(1 — 
for studying FDR and P(Poisson(2m(l - 7ri)$(i)) > 10) for studying 10- 
FWER (here, we choose fc = 10 just for the purposes of illustration), tti is defined as in 
(2.28). 

Figure 2 shows the overlay of the true path and 100 random estimated paths for FDTP, 
FDR and fc-FWER, respectively. As n increases, we see that the true path and estimated 
paths are fairly close to each other, which, in turn, validates our asymptotic theory. We 
can see that the slopes of FDTP and 10-FWER are very steep, which means a small 
change in the critical value results in a large change in the level of control, while the 
FDR has a flatter trend. 

4.2. Simulation study 2 

Under the same set-up as in the previous section, we simulate data with different er- 
ror terms: standard normal (A'^(0,1)), Student t with one degree of freedom (Cauchy), 
Student t with four degrees of freedom (t(4)), Student t with ten degrees of freedom 
(i(10)), Laplace and exponential. Note that, except for the Cauchy error term, all of 
the error terms satisfy the condition of finite fourth moment. Empirical distributions of 
FDTP, FDR and fc-FWER based on 100 000 repetitions are treated as the gold standard 
for obtaining true critical values. Each scenario is repeated 1000 times to evaluate our 
proposed method for estimating the critical value based on asymptotic theory. We con- 
trol FDR at different levels (from 0.01 to 0.2) to get true and estimated critical values. 
Asymptotically, the estimated critical value t based on our theory should be very close 
to the true critical value t and lie on a diagonal line of the square. From Figure 3, the 
estimated critical values i do not match the true critical value t under the Cauchy er- 
ror since the Cauchy distribution does not have finite fourth moment. For the Cauchy 
distribution, even the central limit theorem does not hold since it does not have finite 
mean. As the number of arrays n increases, the estimated critical values t match the true 
critical values t better under symmetric error terms (iV(0, 1), i(4), t(10) and Laplace), but 
not quite so well under asymmetric errors (e.g., exponential errors). The difficulty with 
the exponential error terms suggests the value of conducting research to derive higher 
order approximations. We plan to undertake this in the near future. 
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True and 1 00 random estimated paths for FDTP with t(4) error term, a = 0.2, m =1 0000, tc.^ = 0.2, n =20, 50 and 300 
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Figure 2. Overlay of true and 100 random estimated sample paths with respect to cut-oflt t for 
the three procedures under differing sample sizes. 



4.3. Simulation study 3 



The above results are from the independent test setting. We carried out similar simula- 
tion studies for the dependent setting and found that the corresponding plots are quite 
similar to the above results and the same conclusions can be drawn. To see whether our 
proposed method obtains the claimed level of control, we use a hidden Markov chain to 
generate dependent indicators Hi,i = 1, . . . , m. Conditional on ff,, i = 1, . . . , m, the data 
is generated independently. The transition probability of the hidden Markov chain is set 
to 

l-piPi 
Pol -Po 

where pi is the transition probability from to 1 and po is the transition probability 
from 1 to 0. In the simulation, po = 0.8 and pi = 0.2. Based on the limiting station- 
ary distribution, the alternative proportion should be tti =pi/{po +Pi)- Under the null 
hypothesis, we simulate data from four error terms (iV(0, 1), i(4), Laplace and expo- 



t-tests in very high dimensions 



365 





Figure 3. Comparison of true and estimated critical values using FDR for different error terms 
and numbers of arrays n. 



nential) and, under the alternative hypothesis, we simulate data with mean effects half 
from [/m/(0.1, 0, 5) and half from C/m/(— 0.5, —0.1), plus the same four error terms. Fig- 
ure 4 uses FDR as the control criterion. For different control levels 7, we compare the 
claimed level of control and the actually obtained level of control based on our method 
for different numbers of arrays: small (n = 20), medium (n = 50) and large (n = 300). 

From Figure 4, we can see that when the number of arrays n is small (n = 20), wc do 
not, in general, achieve the claimed level of control. If we have a medium sample size 
(n = 50); the obtained level of control is very close to the nominal level of control and 
the results are almost perfect if we have a large number of arrays (n = 300), even for the 
asymmetric exponential error term. This strongly supports our theoretical predictions 
but suggests that higher order approximations would be useful in some settings. 



Table 2. Obtained control level using 10-FWER with nominal control level 0.05 



n 


iV(0,l) 


f(4) 




Laplace 




Exponential 


20 


0.998 (9.0e-05) 


0.90 (7.0e- 


03) 


0.81 (l.le- 


-02) 


1 (0) 


50 


0.52 (1.2e-02) 


0.14 (9.1e- 


03) 


0.17 (1.2e- 


-02) 


1 (0) 


300 


0.076 (3.8e-03) 


0.031 (2.8e 


-03) 


0.05 (2.7e- 


-03) 


0.82 (4.6e-03) 
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Figure 4. Comparison of nominal and obtained control level for different error terms and 
numbers of arrays n. 



To see the performance of our method using 10-FWER, Table 2 summarizes the control 
level actually obtained for different error terms and numbers of arrays n when the nominal 
control level is 0.05. The obtained control level is incorrect when the number of arrays n is 
small, which can be deduced from the samples paths of 10-FWER given in Figure 1. It has 
a very steep slope, so when n is small, the approximation is crude and there is a noticeable 
difference between the estimated critical value and the true critical value, yielding a 
big difference in the control level. For large sample sizes, the obtained control level is 
reasonably good because our asymptotic theory begins to take effect. The exponential 
error setting appears not to perform as well as the other error settings. 



4.4. Simulation study 4 

All previous numerical studies involve the alternative proportion estimate tti defined 
in (2.28). In this section, we investigate numerically how this estimate is affected by 
number of arrays n and compare with the alternative estimate proposed by [23]. The 
first simulation set-up is similar to the one in the previous section. We drew N = 1000 
sets of data as follows. Dependent indicators Hi,i = 1, . . . ,m, are generated from a hidden 
Markov chain with the limiting alternative proportion tti =0.2. Conditional on these, a 
vector of expected values, fi = (/ii, . . . was constructed. The expected values for 
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the true null hypotheses were set to with standard normal noise, whereas the expected 
values for the alternative hypotheses were drawn from [/ni/(0.1, 0.5) plus standard normal 
noise. Correspondingly, 1000 replications of the proportion estimate tti were calculated 
using (2.28). The root means square error (RMSE) is given as 



RMSE = 



where ttJ"-' is the estimate of tti for the nth simulated data set and ttJ"'' is the truth. 
Table 3 summarizes the effect of n. As the number of arrays n increases, the RMSE gets 
smaller, which validates our asymptotic prediction. 

In the second simulation, we compare our proportion estimate with the one using 
spline smoothing proposed by [23]. Recall the proportion estimate 7ro(A) = > A; i = 
1, . . . , m}/(m(l — A)). The smoothing approach proceeds as follows: first, 7ro(A) are calcu- 
lated over a (fine) grid of A; then, a natural cubic spline y with three degrees of freedom is 
fitted to (A,7ro(A)); finally, ttq is estimated by ttq = 2/(1). The simulation set-up is similar 
to the previous one, except that we have two groups here with ni = 70 and n2 = 80. We 
change the alternative proportion to compare the performances of our approach {irl^) 
with the spline smoothing approach (tt^*) in Table 4. They produce very similar results; 
both are conservative, with less bias using our approach and less variance using the spline 
smoothing approach. The advantage of our approach is that it is computationally very 
fast, while the spline smoothing approach requires that p- values are first obtained using 
permutation, which is computationally much more intensive than our approach (which 
can be computed directly from the t-statistics). 



Table 3. RMSE for iV = 1000 estimated values 

of TTl 



n 


20 


50 


300 


RMSE 


0.0156 


0.0136 


0.0104 



Table 4. Proportion estimate comparison 



TTl 


0.05 


0.1 


0.15 


0.2 


0.25 


0.3 


0.35 


0.4 


0.45 


ck 


0.044 
0.041 


0.091 
0.081 


0.141 
0.125 


0.182 
0.161 


0.217 
0.195 


0.255 
0.236 


0.289 
0.276 


0.335 
0.323 


0.365 
0.355 


sd(Tvf) 


0.042 
0.039 


0.043 
0.041 


0.041 
0.036 


0.040 
0.040 


0.046 
0.041 


0.041 
0.038 


0.047 
0.034 


0.042 
0.036 


0.038 
0.031 
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4.5. Comparison with BH and ST procedures 

In this section, we compare our approach with the BH and ST procedures under the 
dependence structure described in [29]. We also use a hidden Markov model to simu- 
late the indicator function Hi,i = 1, . . . ,m. Conditional on Hi, i = I, ... ,m, the data is 
generated independently. The number of hypotheses tested m = 5000 and the number 
of arrays n = 80. The data generating mechanism is otherwise the same as in the inde- 
pendence case. First, we construct a one-sample t-statistic and apply our procedure to 
obtain the critical value for the rejection region. We then obtain p-values and g-values, 
and apply the BH and ST procedures to decide which genes are significantly expressed. 
We now briefly describe the BH procedure. Let pi be the marginal p- value of the ith test, 
1 < z < TO, and let < • • • < P{m) be the order statistics of pi, . . . ,pm- Given a control 
level 7 6 (0,1), let 

r = max{i G {0, 1, . . . , m -|- 1} < ji/m}, 

where po — and P(^m+i) = 1- The BH procedure rejects all hypotheses for which < 
If r = 0, then all hypotheses are accepted. The g- value in [23] is similar to the well- 
known p-value, except that it is a measure of significance in terms of FDR, rather than 
type I error, and an estimate of alternative proportion is plugged in, based on available 
p- values, as described in the previous section. We revisit the motivating example and give 
a plot of the claimed FDR and actually obtained FDR by using the proposed critical 
value method. From Figure 5, we can see that our procedure controls the FDR at the 
claimed level asymptotically, although somewhat liberally for finite samples, and has 
better power at the same target FDR level compared with the BH and ST procedures. 

5. Applications to microarray analysis 

We now apply the proposed procedure to the analysis of a leukemia cancer data set 
[14] in order to identify differentially expressed genes between AML and ALL. For the 
original data, see http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi. In this 
analysis, we use the methodology developed for the dependence case. The raw data 
consist of TO = 7129 genes and 72 samples coming from two classes: 47 in class ALL (acute 
lymphoblastic leukemia) and 25 in class AML (acute myeloid leukemia). Our simulation 
results showed reasonable performance of the procedure for a moderate sample size in 
this range. For each gene location, the two-sample t-statistic comparing the 47 ALL 
responses with the 25 AML responses was computed. Using our proposed approach for 
the dependent case, we find the critical value for controlling FDR at level 7, 

I Pmit) j 

where Pm = l{|Ti|>t}/™ and tti is estimated by (2.28). 




Figure 5. FDR control and power comparison. 
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In Figure 6, we plot the FDR level and the number of significantly expressed genes 
by our (CK) procedure, BH procedure and the g-value based Storey-Tibshirani (ST) 
procedure. From the plot, we can see that our procedure detects the largest number of 
significant genes, followed by the ST procedure and then the BH procedure, which is 
the most conservative one. At FDR level 0.01, we detected 870 genes, the ST procedure 
detected 778 genes and the BH procedure detected 614 genes. Using the two-sample 
i-test, similarly to the higher power of our approach in simulation studies, we detected 
all of the genes that the other two approaches detected. The BH procedure is very 
conservative at the expense of power loss. The ST procedure requires permutation to 
obtain p-values, while our procedure gets the critical value directly and is thus faster in 
terms of computation. The estimation of tti is 0.467 by our procedure and 0.477 by the 
ST procedure. These results can serve as a first exploratory step for more refined analyses 
concerning these significant genes. Another issue may be that the critical value approach 
based on asymptotic FDR control may not be conservative enough in some settings. 




Figure 6. Comparison between our (CK) procedure, the ST procedure and the BH procedure 
using real data. 
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6. Concluding remarks and discussion 

We have presented a new approach for the significance analysis of thousands of features in 
high-dimensional biological studies. The approach is based on estimating the critical val- 
ues of the rejection regions for high-dimensional multiple hypothesis testing, rather than 
the conventional p-value approaches in the literature. We developed a detailed method 
that can be used to identify differentially expressed genes in microarray experiments. 
The proposed procedure performs well for large samples, reasonably well for intermedi- 
ate samples and not quite as well for small samples, and appears to perform better than 
existing alternatives under realistic sample sizes. Our method is also computationally 
faster than the competing approaches. The potential for improvement in small-sample 
performance motivates the need for a second-order expansion of our theoretical work. In 
addition, we have proposed a new consistent estimate of the proportion of alternative 
hypotheses under certain conditions. Numerical studies demonstrate that our methodol- 
ogy fits the truth well and improves the statistical power in multiple testing. Extensions 
of the current work can be pursued in several directions. 

First, as stated above, the precision of the asymptotic approximations has room for 
improvement in small-to-moderately-small sample sizes, suggesting that a second-order 
expansion would be valuable. Second, in the dependence case, it would be of interest to 
see how the rate of convergence could be derived under various assumptions on the form 
of the dependence. Thirdly, the plug-in estimator tti is consistent, but somewhat ad hoc. 
Complete, theoretical properties of this estimator remain to be explored. Last, but not 
least, we only considered a fixed proportion tti of alternative hypotheses. It is of great 
interest also to consider the sparsity setting, in which tti — as m — >■ oo , and to see what 
patterns emerge. 



Appendix: Proofs of main results 

Our main tools are limit theorems of empirical processes, Berry-Esseen bounds and self- 
normalized moderate deviations for one- and two-sample i-statistics. 

A.l. Preliminary lemmas 

We first state a non-uniform Berry-Esseen inequality for nonlinear statistics. 

Lemma A.l ([5]). Let CiiC2,---,^n be independent random variables with E^i~0, 
E^f — 1 and E\^i\^ < oo. Let Wn = X^'jLi ^« '^"■'^ ^ ^ ^(^ii ■ • ■ i^n) be a measurable 
function of{^i}. Then 

|P(VK„ + A<z)-$(z)| 

<P(|A|>(|z|-l-l)/3) (A.l) 

(n n 
II A||2 + Y.{E^ff\E{A A,)'f' + ^ E\^,\' 
i=l 1=1 



372 H. Cao and M.R. Kosorok 

This is [5], Theorem 2.2, and the proof can be found there. The next lemma provides 
a Berry-Esseen bound for non-central t-statistics. 

Lemma A. 2. Let X,Xi,...,X„ he i.i.d. random variables with E{X) =0, cr^ — EX"^ 
and EX* < oo. Let 

-J n ^ n 



Then 

J ^{X + e) 



<K' , , ^ (A.2) 



P -^-^ < X - $(2; - \Mc/cr) 

\ Sn J \L ^ yd, — ^j iiis/ u 

for any c and x, where K is a finite constant that may depend on a and EX*. 
Proof. Without loss of generality, assume that x>0 and <t = 1. Using the fact that 

l-\t\<{l+ty^^ <l + \t\ fort>-l, (A.3) 



we have 



and 



Therefore, 



XSn - X{1 +sl- 1)1/2 < ^ _ (A 4) 

xs„>x(l- 14-11). (A.5) 



p 



. 1^ V^(^+c) ^ ^ p^^^- ^^^^ ^^^^ 

< Piy/^X <x-Vnc + x\sl - 1|). 

We now apply (A.l) with = Xi/y/n, Wn = \/nX and 

z=^x-y/nc, A = -x\sl-l\, A, = -a:|s^^, - 1|^ 

where ^ is defined as with replacing Xi . 
Noting that 

si 



(A.6) 



- 1 = ^ (j2iX^ - 1) - - X,/n 
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we have 

E\sl^l\^ <KEX^/n (A.7) 



and 



E{sl - si,)' = ^-l-^E{{Xf - 1) - nX' + n{X - X^jnf + 1)' 

^ E{{X} - 1) - X,{2{X - X,/n) + X,/n) + if 



(n-l)2 

- (^T^^^^^^*' - 1)' + 2 + Xf{2{X - X,/n) + X,/nf) (A.8) 

- (n-2)2 ^'^'^^^ + ^ + ^^i^(8(^ - ^^/")^ + 2EX^/n)) 



< KEX^/n^. 
It follows from (A.7) and (A.8) that 



l|A||2<A^ 



\x\'/EX^ 



E(i?en^/^(i?(A-A,)^)^/^<i^^^' 



and 



Therefore, by (A.l), 



\P{^X <x- V^c + x\sl - 1|) - $(.T - V^c)| < /^^^^J^'j (A.9) 

[L + \x — y/nc\)^n 



Similarly, 



,|^ 0i(X + c) 



P( ' <x ) >P(V"A:<.T-Vnc-a;|s2 -1|) 

and 



|P(V^A <x-V^^c- x\sl - 1|) - $(x - V^c)| < j^p + N) (A.io) 

(1 + - Vnc|)V?i 
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This proves (A. 2). 



□ 



We also need a moderate deviation for the non-central t-statistics, as given in the 
following lemma. 

Lemma A. 3. Suppose that X, Xi,i — 1, . . . are independent identically distributed 
random variables. Let 



X 



1 " 



// X satisfies £;| Xj^ < oo, E{X^) = cr^ > q and E{X) ^ 0, then 
V^iX + c) 



p(^^S^JA >t^=P{\Z + cV^/a\>t){l+o{l)) 



(A.11) 



uniformly in c and t ^ o{n^^^). Here, and in the sequel, Z denotes a standard normal 
random variable. 

Proof. When t is bounded, (A. 11) follows from Lemma A. 2. Consider large t with t = 
o(ni/^). We need the following result of [27, 28]: 



P 



>t^={l- $(t - cV^/<j)){l + o(l)) 



(A.12) 



uniformly in \c^/n/a\ < t/5 and t = o(ri^/^). We note that following the same lines as 
their proof, we can see that (A.12) remains valid for — </5 < Cy/n/a < t. We write 



P 



V^{X + c) 



>t] =P 



V^{x- 



>t]+p 



V^i{-X~c) 



>t 



By (A.12), the remark above and the fact that 

1 - $(t + x) = o(l - $(t - x)) 

for X >1 (recall here that we assume t is large), (A. 11) holds for —t < c^Jnla < t. Now, 
assume \c\y/n/a > t. Then, by (A. 2), 



V^(X + < 



>t^ -P{\Z + c^/^/a\>t) 



0(1). 



Since \c\y/n/a > t, we have P{\Z + C'^la\ > t) > 1/2 and hence 
V^(X + c) 



>t^ =P(|Z + cVH/ct| >t)(l+o(l)). 
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This completes the proof of (A. 11). □ 
The lemma below shows that defined in (A. 26) under independence is bounded. 
Lemma A. 4. Assume that there exist Eq > and Cq > such that 

P(|V^MiMI>£o)>co. (A.13) 

Let tn_rn Satisfy (A. 37). Then 

tn^rn < ^0, (A. 14) 

where to is the solution of 

anico exp((to - £0)^0) = 12(1 + io - £0)- (A.15) 

Proof. It suffices to show that 

V^E^,{to) > (var(ei(to)))'/'^7- (A.16) 
It is easy to see that P{\Z + a\> to) is a monotone increasing function of a > 0. Hence, 

P{\Z + ^^ii/ai\>to) > P{\Z + ^^ll/al\>to,\V^^ll/<Jl\>eo) 

> P{\Z + eo\>to)P{\V^^ll/al\>eo) 

> coP{\Z + eo\ > to) > co(l - $(to - £0)) (A.17) 

^iTTTT 7exp(-(to-£o)V2) 

6[i + to — £oj 

^ iTTTT ^ exp(-t2/2 + {to - £o)£o). 

6[i + to — £oj 

Here, we use the fact that 

-e-'''/^>l-^(x)> e-^'/2 fora;>0. 

Under the null hypothesis Hi ~ 0, which corresponds to /i^ ~ 0, we apply Lemma A. 3 
and obtain 

P{\Ti\ > t\Hi = 0) = P{\Z\ > t){l + 0(1)) (A.18) 

uniformly in t = o(n^/^). 

Under the alternative hypothesis Hi ~ 1, we apply Lemma A. 3 to Xij — fjn and obtain 

P{\Ti\> t\Hi = 1) = P(| V^(Xi - Ail + ^-il)/sl\ > t\Hi = 1) 

= E[P{\Z + V^i^ii/(Ji)\ > t\nuai)]{l + 0(1)) (A.19) 
= P(|Z + V^Mi/ai|>t)(l + o(l)) 
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uniformly in t = 0(71"'^/^). 
Also, note that 

P{\Ti\ >t)^ P{\Ti\ > = 0) + PdTil >t,Hi^ 1) 

= (1 - ^i)P(|ri| > t\Hi = 0) + ^iP(|Ti| > t\Hi = 1) 

(A.20) 

= {l-n,)P{\Z\>t){l+o{l)) 

+ niP{\Z + V^A'iMI > + 0(1))- 
By (A.34), (A.18), (A.20) and (A.17), 

^a(io) = «(1 - 7Tl)P{\Z\ > to)(l +0(1)) + OTTlPdZ + V^^ll/<Jl\ > to)(l +0(1)) 

-(l-7ri)P(|Z|>to)(l+o(l)) 

^ "^1 an , ? V exp(-io/2 + (^o - eo)£o) - 2P{Z > to) 

+ (A.21) 
^6(T^^f^-P(-*o/2 + (t„-.o)s^^ 



= e-*o/2, 

by (A. 15) and the definition of to. It is easy to see that £'^^ < 1 and var(a(<o)) < 1 in 
particular. Thus, by (A.21), 

^^^|^>V^e-o/^>.„ (A.22) 
(var(a(i)))i/^ 

provided that m is large enough. This proves (A. 16). □ 

The following i.i.d. results are essential for the general results. 

Lemma A. 5. Assume the conditions of Theorem 2.1 with (2.2) replaced by the assump- 
tion that (Ti,Hi)^ i = 1, . . . , TO are i.i.d. and tti = P{Ti = 1) . Let J' = {i: Hi = 1} be the 
set that contains the indices of alternative hypotheses. Also, assume that Hi^Ui are i.i.d. 
for i^ J . Let 

pit)^Pi\Ti\>t), (A.23) 
aiit)^apit)-{l-7^i)Foit) (A.24) 

and 

bjit) = a^p{t){l-p{t)) + 2a{l - Tri)p{t)Fo{t) 

(A.25) 

+ (1 - ^i)^^o(<)(l - 2a - (1 - 7Ti)Fo{t)). 
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(i) // t^n%. chosen such that 

t{^X = M{t:V^iai{t)/bi{t) >z^}, 

then 

lim P{FDP>a)^ lim P(l/ > ai?) < 7 

m— ^00 m— foo 

holds. 

(ii) // t^n'm chosen such that 

lim = lim E{V/R) < 7 

(iii) // 1^;™^™ is c/iosen suc/i f/iai 

t™^=inf{t:P('yW>fc)<7}, 
where r]{t) ~ Poisson(6'(t)) and 

lim k-FWER = lim P(y > fc) < 7 

holds. 

Proof, We first prove the i.i.d. case for one-sample t-statistics. By (2.3), 

m m 

aR~v = a'^i{m\>t} - X!^-^ ~ ^^)hm\>t} 

4=1 i=l 
m 

= ^(Fj + a- l)/{|T.|>t} 

1=1 

m m 

= ^^h\Ti\>t}hHi = l} ^)h\Ti\>t}hH^=0} 
2=1 i=l 

m m 

= ^^h\Ti\>t}i^ - hH^=o}) + ^{0^ - ^)i{m\>t}hH, 
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m 

= '^io'hm\>t} - hm\>t}hH,=o}) 

m 

where 

£.1 := Ci{t) = "-^{|T,|>t} - ^{m\>t}I{Hi=o} 
is obviously a Donskcr class indexed by t [15]. Hence, 

PiV>aR) = P e. (t) < 0^ . ( A.32) 

Note that since £i are independent random variables, we can apply the uniform central 
limit theorem to choose t so that 

p(^m<oj<l- (A.33) 

To this end, we need the mean and variance of Without loss of generality, we use 
as an example, since £i are i.i.d. random variables. Thus, 

E^i = aPQTil >t)- P{\Ti\ >t,Hi= 0) 

= aP{\Ti\ >t)- P{Hi = 0)P(|Ti| > t\Hi = 0) (A.34) 
= aPi\Ti\ > t) - (1 - 7ri)P(|Ti| > t\Hi = 0). 



Similarly, 



and 



E^f = £;(a2/{|Ti|>t} + (1 - 2a)/{|Ti|>t}/{Hi=o}) 

= a^Pi\Ti\ >*) + (!- 2a)(l - 7ri)P(|Ti| > t\Hi = 0) 

var(Ci) = i?e?-(i?ei)' 

= a^Pi\Ti\ >t) + il- 2a){l - 7ri)P(|Ti| > t\Hi = 0) 

- {aPdTil > t) - (1 - Tri)P{\Ti\ > t\Hi = O)}^ 
= a^Pm\>t){l-P{\T,\>t)) 

+ (1 - > t\Hi = 0)(1 - 2a - (1 - 7ri)F(|Ti| > t\H^ = 0)) 

+ 2a(l - 7ri)P(|ri| > t)P{\Ti\ > t\Hi - 0). 



(A.35) 



(A.36) 
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Now, define 



>z^\. (A.37) 



(var(ei(t)))V2 \ 

By Lemma A. 4, i„,m is bounded and hence the uniform centrai fimit tlieorem yieids 



< -- 



^^2— 1 (^n,m) -E^iiin.m^^ 

(E"ivar(C.(i„,™)))i/2 
(Er=ivar(6(i„,,„)))i/2 



This proves (A.27). 
Note that 



/ P{FDTP>x)dx 
Jo 

= I P{V>xR)dx 
Jo 

Jo \ VVar^i y 



(A.38) 



Letting m — )• +oo, P{N{0, 1) < — y/mE^ i / V Var ) is either or 1, depending on the sign 
of E^i. Thus, the range of x that makes tliis probability 1 satisfies 

E^i = xPQTil >t)-{l- ^i)P(|ri| > t\Hi = 0) < 

and the corresponding x < {1 - 7ri)P(|ri| > t\Hi = 0)/P(|Ti| > t). In order to control 
FDR at level 7, we require 

(l-^i)P(|Ti|>t|Fi=0) 

— < 7- 



P{\Ti\>t) 
This proves (A.28). 

For the /c-FWER, we use the characteristic function method. Let iji = (1 — Hi)I^\Xi\>t} : 

m 
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J|[e'^(l -^1)^^0 + 1- (l-^i)i^o] 



i=l 



1 + — m(l-7ri)i^o(e''-l) 
m 



eA(c"-i), 



where toq-Fo — ^ A as m ^ oo and A is the parameter for the Poisson distribution such 
that 

P(Poiss(A) > fc) < 7. ^ 
The following functional central limit theorem is needed in the proof of Theorem 2.1: 

Lemma A. 6. Suppose the triangular array {/„i(cj, t), i = 1, . . . , m„, i € T} consists of 
independent processes within rows and is almost measurable Suslin analytic set (AMS)(see 
page 25 in [15]). Let 

irtji 

X„iuj,t) = J2 [fm {io,t)~Ef^,{-,t)]. ( A.39) 

i=l 

Assume: 

(A) the {fni} o.re manageable, with envelopes {Fni\ which are also independent within 
rows: 

(B) H{s, t) ~ lim„_j.oo EXn{s)Xn{t) exists for every s,t ^ T; 

(C) limsup„^^ J:T=i E*Fl < 00; 

(D) lim„^oo E*F^^l{Fr,, >e} = for each e > 0; 

(E) p(s, = lim„^oo Pn(s,i), where 

("In \ 
J2E\f^,i;s)-fm{;t)\^j 

exists for every s,t and, for all deterministic sequences {s„} and {tn} in T , 
if p{sn,tn) -i' 0, then tn)^0. 

Then Xn converges weakly on l°°(T) to a tight mean-zero Gaussian process X concen- 
trated on UC(T,p), with covariance H{s,t). 

Proof. The definitions involved in this lemma and the proof can be found in [15], The- 
orem 11.16. Below, we verify that, conditional on Ji, fni{uj,t) = £^i{uj ,t) / ^/m. satisfy the 
conditions in Lemma A. 6. Since ^i{uj,t) is the difference between two monotone bounded 
functions, it is clear that, conditional on ^i{uj,t)/^/rn is AMS, manageable and has 
envelopes a/y^. Also, 

EX„{s)Xn{t) = EE[X^{s)X,,{t)\n] 
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EE 



EE 



m y/m 

- Ec.is)n)im\n Emm 



i=l i=l 

-EY,{a^H, + (1 - af{l - Hi))EI{\T^\>t^s\H} 

m 

- ^[o^Hi + (1 - a){l - H^)fEI{\T,\>sU}EI{\T,\>t\H} 

i=l 
^ ?n 

-Ey{a^H,Fi{t\Js) + (1 - af{l - H,)FoitUs)) 

m ^ — ^ 



i=l 



- ^[a'-ff. + (1 - af{l - H,)][H,Fi{s) + (1 - H,)Fo{s)] 

i=l 

X [H,Fi{t) + {l-H,)Fo{t)] 

- ?n 

-Ey [a^H, (Fi {t U .s) - {t)Fi (s)) 



+ (1 - - Hi){F„{tUs) - Fo{t)Fois))] 

^ nia\F,{t Us) - Fi{t)Fi{s)) + (1 - 7ri)(l - a)2(i^o(i Us) - Fo{t)Fo{s)) 
= H{s,t), 

which is the same as q^(t) when s = t. (C) is easily satisfied. For aU e > 0, there exists an 

A^o such that a/No < e, so hni,„^oo Z^ZLi Ea'^ /ml{a/y/m > e} = hni„i_yoo Z^ilV^ /m = 
0, which verifies (D). Similarly, we can show that (E) is satisfied and thus the functional 
central limit theorem holds. □ 



Let 



G{t) = aTTiEP{\Z + > <) - (1 - a)(l - 7ri)P(|Z| > t) 

= a^:lEP{\Z + ^/^\til\/al\> t) - {I - a){l - 7ri)P(|Z| > t) 



and 



The following lemma is needed in the proof of consistency. 



(A.40) 
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Lemma A. 7. Assume that < tti < 1 — a and (A. 13) is satisfied. Then 

f <0 fort<ti, 
G{t) I =0 fort^ti, (A.41) 
l>0 fort>ti. 

Moreover, G'(ti) > c-*o/V^/27T. 

Proof. Wc first observe that < ti < to by the fact tlrat G(0) < 0, G{to) > e'*o/2 > o in 
(A. 21) and G{t) is a continuous function. 

To prove (A.41), it suffices to sliow tliat tfiere exists a ^2 > sucli tfiat G{t) is increas- 
ing in [0,t2] and decreasing in [^2,00). To this end, consider the derivative of G: 

G'{t) = ~aTriE{(f>{t - V^\fii\/ai) + (j){t + + 2(1 - a){l - ^i)0(t) 

'^^'f an,E(e.,(^^ + ^^)+cJ^^^^^)) (A42) 



c 



27t I V V 2crf tTi y V 2af (71 

+ 2(l-a)(l-^i) 



Let 



Hit) = ^a.,E[eM-^^+^^ 

2al ^ (71 



cxp(-^-^^^) )+2(l-a)(l-.i). 



Then 

H'{t) = -aniE 



( y/n\fii\ f y/n\fj,i\t n^W /ii / y/n\^jLi\t n^l 
\ exp exp 

[ (Tl V ^1 2(7fy (71 V (71 2(7f 



CTi V V / V en 



(A.43) 



for all t > 0. Therefore, H{t) is monotone decreasing. Taking into account the facts that 
H{Q) > by assumption, tti < 1 — a and H{+(X)) < 0, we conclude that H{t) has only 
one zero point, say, t2- Moreover, H{t) > ioi t < t2 and H{t) < for t>t2- This is also 
true for G'{t), by (A. 42). Hence, G{t) is increasing for t < t2 and decreasing for t>t2- 
Note that since G(0) < 0,G(to) > and G(-|-oo) = 0, we can see that G{t) has a unique 
zero point ti and i2 > ^i- Since G(t) is increasing for < t < ^2, we have G'(ti) > 0. We 
now prove that G'(ii) > c~*n/^/\/27T. It follows from the proof of (A. 21) that 

G(to)>e-*o/2. (A.44) 
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— /2 

Recalling that G'{t) = H(t) and H is decreasing, we have 

G(to) = G(io)-G(ii)= f"G'{s)ds< f ' ^^H{h)ds 
Jti Jti v27t 

(A.45) 

< H{ti){l - $(ii)) < iJ(ti)e-*i/2 ^ G'(t^)y2^. 
This proves G'(<i) > e-*o/V^/27T. □ 

A. 2. Proof of Theorem 2.1 

Wc now return to show our main theorem under dependence. Let H ^ {Hi,l < i < to}. 
To prove (i), following along the same lines as the proof of Lemma A. 5, we need to obtain 
the asymptotic distribution of 

PiV>aR)^p(^^^,it)<0^, (A.46) 

where 

^i{t) = aI{\Ti\>t}- I{\T,\>t}I{H,=o} = (a + -f^i-l)-f{|T,|>i} = \ot'H-i-(X-oi){^-iii)V{\T,\>t'\- 
Note that 

-P(|^^| > t\U) = (1 - ll,)P(\%\ > t\H, = 0) + H,P{\T,\ > t\H, = 1). 
Given "H, 1 < z < m, are independent random variables. The conditional mean equals 

E^'l^ = Y.^a^^hH.=o}WPi\T^\ > m^ = 0) + ai?(/{H,=l}|H)P(|T,| > t\H, = 1) 
-i?(/{ff.=o}|H)P(|T,|>t|i7, = 0)} 

m 

= - > t\H^ = 0) + aff,P(|r.| > = 1) 

1=1 

= a^{i?.P(|T.| > t|i7, = 1)} - (1 - a) ^{(1 - Hi)P{\T,\ > t\H, = 0)} 

= aTOiPi(t)-(l-a)TO,oPoW. 
Next, we calculate the conditional variance of given H: 



vaT(Y,m\n = varK^[ai7, - (1 - a)(l - i/,)]^{|T.|> 



t\n} 
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m 

= Y,{a^H, + (1 - af{l - //O) var(/{|j.,|>,|„}) 

i=l 

= a'miFi{t){l - Flit)) + (1 - a)^moFoit)il - Fo(t)). 

From (2.7) and (2.8), 

Aim(i) r- t^m{t)/m 
— 'm- 



By the fact that mi/m— > tti a.s., we have 

Hmit)/m-^aT:iFi{t)~{l~a){l~TTi)FQ{t) a.s. (A.47) 

and 

al{t)/m^a^7TiFi(t){l-Fiit)) 

+ (1 - a)2(l - 7ri)Fo(t)(l - Fo(t)) = q^{t) a.s., 

which is smaller than var(^i(t)), due to the fact that 

vaiX = E{veiT{X\Y))+vaT{E{X\Y)) 

for any two random variables X and Y. By (A. 16), we can see that the critical value 
defined at (2.9) is bounded. Thus, conditional on "H, we can use the functional central 
limit theorem on X^I^i / by virtue of Lemma A. 6. The limit is a Gaussian process 
with continuous sample paths. Hence, 



n 



< i?{P(A^(0,l)<z(t)<-z^g(i))} 
->P(iV(0, l)<-z^)=7 asm^oo. 

This proves (2.9). 

(ii) can be proven similarly. The characteristic function method can be used to prove 
(iii). 
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A.3. Proof of Theorem 2.2 
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Wc first prove (i), and (ii) follows along the same lines as the independent case, plus 
a conditional argument. Without loss of generality, we use Ti as a representative that 
eomes from the alternative. We have to show that 

\in,m-tn.rn\=o{l) a.S. (A.49) 

We first prove that 

|in,m-ii|=o(l) a.S., (A.50) 
where ti is defined as in (A. 40). It sufiiees to show that for any e > 0, 

^";"^^^+/) > z, (A.51) 

and 



Tm{s) 



< for aU s<ti~ e. (A. 52) 



Recall that Pm{t) — m S"= -^{|Ti|>*}- Given H, by the uniform law of the iterated 
logarithm (see, e.g., [10]), 

Pm{t) - - V{(1 - H,)Fa{t) + i/,Fi(t)} =o(m-i/2(loglogm)i/2) a.s. 
m ^ — ' 

i=l 

By the strong law of large number, 

-Y\{{^-H^)F^{t)+H,Fl{t)}^{l-^l)Fa{t) + ^lFl{i) a.s. (A.53) 

i—l 

So 

Pm{t) ^ (1 - 7ri)Fo(t) + ^iFi{t) a.s. 

Recall that 

1/^.(0 =aPmW- 2(1 -7ri)<l(t). 

By (A. 2), our strong consistent estimate tti described in Section 2.3 and the continuous 
mapping theorem, we have 

sup \u,n{t) - {a((l - 7ri)Fo(t) + an^Fiit)) - (1 - n^)P{\Z\ > t)}\ ^ a.s., (A.54) 
t 

which, together with (A. 20) and the definition of G, implies that 

sup \i^r,i{t) - G{t)\ ^ a.S. (A.55) 

0<t<l+to 
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In particular, since G(<i + e) > for < e < ^2 — ^i, we have 

v,n{ti+e)>G{ti+e)/2 a.s. (A.56) 

for sufficiently large m and, therefore, ^jTnv,n{ti + e) > z^Tm{ti + e). This proves (A. 51). 
Similarly, since G(t) is increasing and G{ti — e) < 0, we have 

max i^m^s) < G(ti - £)/2 a.s. (A.57) 

s<ti — e 

for sufficiently large m. Hence, (A. 52) holds. This proves (A. 50). 
Following the same lines as the proof of (A. 50), we have 

K™-ti|=o(l). (A.58) 

This completes the proof of (A. 49). 

For A;-FWER, let r^o be the number that satisfies P(Poiss(?/o) > fc) < 7- Let to.m = 
tn~^^^^ and t,„ = t^.f,^^^. Thus, by definition, io,m is the t that satisfies (1 - 
7ri)mFo(t) = 770 and tm is the t that satisfies 2(1 — 7ri)m$(t) = rjQ. We then have 
= 1' ^hich implies that 

^#4 = i^ = l + op(l) 

2$(t,„) l-TTl ^ 



*(*o^"^(l + 0(n-i/2)) = i + op(l) 

l+Op(l) 



$(tm) 
$(i0,m) 



$(im) 

i!!Le-*o,™/2+*L/2^1 + op(l) 



Hence, R = tm/to.m — > 1 in probability. Thus, 



io,™-4 = op(l) =^ \to,m-U^ \. I ^Op((logm) 1/2) 

-L + to,m + tm 



__Op(l) 

1 + 

since = 0p(n^/^) and logjn = o{n^^^) 
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In this section, we give the proof of the rate of convergence for the i.i.d. case by using 
the one-sample t-statistic. Let p{t) = i^dTij > t) and let 



^ rn 



m 

i=l 

By the Glivenko-Cantelli theorem, 

sup|pm(i) -^0 a.s. (A.59) 
t 

and, by the Donsker theorem, 

sup\p„i{t) -p{t)\ = 0{m-^/^) in probability. (A.60) 
t 

By the uniform law of the iterated logarithm, 

sup|p„(t) ==0(m"i/2(loglogm)i/2) a.s. (A.61) 
t 

We define strong consistent estimators of E^i{t) and var(^i(t)) by i^m{t) and T^{t), 
respectively, where 

i^mit) = ap^{t) - (1 - TTi)P{\Z\ > t) (A.62) 

and 



(A.63) 



T^{t) = a'pra{t){l-Pm{t)) + 2a(l - 7ri)p„,(t)P(|Z| > t) 

+ (1 - 7ri)P(|Z| > t){\ - 2a - (1 - 7ri)P(|Z| > t)). 

We now define an estimator of tn,m by 

4,™-inf(t:^^^^>z,j. (A.64) 
For FDTP, we have to show that 

and 

- in,m| = 0(n"^/^ +m"^/^) in probability. (A.66) 
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Below, we prove (A. 65) and (A. 66). We will show that 

14,™ - ^1 1 - O ( (^) + (^^) (A-67) 
-h\=o(i'-\'\( a.s. (A.68) 



By the uniform law of the iterated logarithm, 



sup|p™W-pWI=0(^(^i^^J^) ^ ) a.s. (A.69) 

Therefore, we have 

sup|^^ft)-Mt)-(l-^i)P(|Z|>t)]|^0(^(^ ^°g|^g"' ^ ' ^ a.s. (A.70) 
Note that 

ap{t)-{l-'Ki)P{\Z\>t)-G{t) 

= a(l - 7ri)(P(|ri| > t\Hi = 0) - > t)) 

+ a^i(P(|ri| > t\Hi = 1) - + xAIa'iMI > t)). 



From (A. 2), we obtain 



P{\Ti\>t\Hi=Q)-P{\Z\>t)^0[-^] a.s. (A.71) 



/n 



and 



P{\Ti\>t\Hi = l)~EP{\Z + ^^ii/ai\>t) = 0[-^] a.s. (A.72) 



/n 



Thus, we have 



suplapm - (l-7ri)P(|Z| >t) -Cmi =0( ^ ) a.s. (A.73) 



Taking into account (A.70), we have 



1 / log log 



sup|^^(t)-G(t)|<c2( ^+ ( ) ) a.s. (A.74) 



t-tests in very high dimensions 



389 



for some constant < C2 < oo. Below, we show that there exists a finite constant C3 > 
such that 

(I /loglogmy^^\ - /I /loglogmy/^\ 



Recahing (A.74), we have, for e = C3(^ + ( '°s|°s'" )i/2)^ that 
w,„(ti + e) > G{ti + e) - C2 ( ^ + 



1 / log log 



G(ti)+eG'(ti + 0i)-C2 



1 / log log 



/n 



provided that C3 is chosen large enough: here, < 6*1 < e and we have used Lemma A. 7. 
For sufficiently large m, we have 



Vmv^iti + e)> Tmih +e)zj. 

This proves that 

//f\'/' /loglogm\'/' , 

tn,m - < C3 I I - I +1 — I ) a.S. 



Similarly, we have 



1 \ 1/2 /1 1 \ 1/2 

f \ / log log m ^ ' 



This proves (A.67). 

Following the same line of proof, we have 



f /log log 771^ ^''^ 



\tn,m — ^l| — O 

If we use 

sup \prnit)-p{t)\ = 0(to~^/^) in probability, (A.76) 
t 

based on the Donsker theorem instead of (A. 69), using the same line of the proof of the 
a.S. convergence rate, we can obtain the rate of convergence in probability, which is 

\in,m - tn,7n\ = 0{n^^^^ + mT^^'^) in probability. 
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This completes the proof of (A. 65). 

Similarly, the critical value for FDR control is bounded, due to the fact that 



EP 



0-1 

By (A.60), (A.61), (A.71) and (A.72), we have 
mnFo{t) 2(l-7ri)#(t) 



>t \ <1. 



sup 

t 



sup 

t 



moFQ{t)+miF~l{t) pm{t) 

moFo(t) 2(l-7ri)$(i) 



oU-i/2 



loglogTTiA 



0(n"^/^ + (to)"^/^) in probability. 



Noting that 2(1 - 7ri)$(i)/[2(l - 7ri)$(i) + EP{\Z + 0T/^i/(Ti| > t)] is a monotone de- 
creasing continuous function with respect to i, combined with the definitions of (t-lf'^) 
and (2.34) and (2.35) hold. 

The proof of fc-FWER is the same as that given in Theorem 2.2. 

A. 5. Proof of Theorem 3.1 

For the two-sample t-statistic, the only part we need to show is the boundedness of 
in,m under independence, which will imply the boundedness in the general dependence 
case, as happens with the one-sample i-statistic. The remaining results follows along the 
same lines as the proof in the one sample t-statistic setting. Based on Lemma A. 8 below, 
plus (3.1), and using the same line of proof as in the one-sample t-statistic case, the 
boundedness of t„.m holds for two-sample t-statistics. 

The proof of the boundedness of t„.m is based on the following asymptotic distribution 
of T* under the alternative hypothesis. 

Lemma A. 8. Suppose that X,Xi, . . . ,Xn-^ are independent and identically distributed 
random variables from a population with mean /ii and variance , and Y,Yi, . . . , 
are independent and identically distributed random variables from another population 
with mean p2 o.nd variance cr|. Assume the sampling processes are independent of each 
other. Also, assume that there are < ci < C2 < oo such that ci <ni/n2 <C2. Let 

T*= .J'^ =, (A.77) 



where 



1 "\ 1 

X^—J2x,, Y^—J2Y^, (A.78) 

1 "1 1 "2 

s? = -^(X,-X)2 and sl^ Y,(Y,-Yf. (A.79) 
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// EX'^ < oo and EY* < oo, then 



P{\T*\ >t) = P{ Z + 



Ail - M2 



>M(l + o(l)), 



(A.80) 



uniformly in t — o{n^^^), where n = max{ri,i, 712}. 

Proof. The proof of this lemma is very similar to the proof of Lemma A. 3 and so we 



A.6. Proof of Theorem 3.2 

This follows the same arguments as in the one-sample t-statistic case, by virtue of 
Lemma A. 8. 

A.7. Proof of Theorem 3.3 

When we plug in an estimator of P{\T* \ > t), 



the proof of the two-sample t-statistic case follows along the same lines as its one-sample 
counterpart, except that we have to show the rate of convergence under the alternative 
hypothesis for the two-sample <-statistic. This follows from the following lemma, which 
completes the proof of Theorem 3.3. 

Lemma A. 9. Let X, Xi, . . . , Xn-^ be i.i.d. random variables from a population with mean 
and variance , andY,Yi,...,Yn^ be i.i.d. random variables from another population 
with mean /i2 and variance cr| . The sampling processes are assumed to be independent of 
each other. Assume that there are < ci < C2 < 00 such that c\ < ni/n2 < C2. Let T* be 
defined as in Lemma A.8. // E\X\'^ < 00 and E\Y\'^ < 00, then 



omit the details. 



□ 





(A.81) 



< 



K{l + \x\) 



(1 + \x - (^1 - /i2)/Vo'i/"-i +cr|/n2|) Vmin{ni,ri2}' 



where K is a finite constant that may depend on cr^, ct|, iJX'* and EY^. 
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Proof. Without loss of generality, we assume that ni = bin, n2 = b2n, fei + 62 = 1 with 
61 > and &2 > 0. Note that 



P{T* <x) = P 



= P 



< P 



^Jsl/ni +s\/n2 

X ^ fii - {Y - fi2) 
y/alJnZTal/n^ 

X~fii-{Y^fi2) 
y/alJn^T'al/n^ 



<x 



^/sl/ni+ sl/n2 

A^i - M2 ^ ^ \/sl/ni+ sl/n2 
\Jcj\lnx + o\ln2 ~ \/crf/ni + a^/n2 

sf/ni + sl/n2 



Ml - M2 



^/crf/ni +al/n2 



al/ni+ al/n2 



- 1 



where we make use of (A. 3). We now apply (A.l) with 
and = !-^:7^^^'T, for ni + 1 < i < m + ns- Let 



y/al/m+al/n 



for 1 < I < ni 



yjal/m+al/ 71-2 



Ml - M2 



yjcy'l/ni +o-|/n2 



(T^/ni + erf /n2 

Sl.i/"1 + S2/'^2 



al/ni+ (jl/n2 



- 1 



for 1 <i <ni, and 



A, = -X 



s\/ni 



'2,1 



/"2 



- 1 



al/ni +ct|/?12 

for ni + 1 < I < 7ii + 712, where s\ ^ is defined as s\ with replacing Xi and §2 j is defined 
as s\ with replacing Yi. Noting that 



sf/ni + S2/"-2 

a'l/ni+ a\ln2 
we have, by (A. 7), that 



1 



a\lnx + (T|/n2 



{[s\-a\)lni + {sl^Gl)ln2\ 



E 



sl/ni + S2/"-2 



(Tj/ni + cr|/7i2 



< A' 



.EX'^ + EY^ 



For 1 < j < Til, 



sf/ni+S2/'^2 s\i/ni+ s\/n2 



7T.5;((Tj/ni + cr|/n2)' 
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by (A. 8). Similarly, for ni + 1 < i < ni + 712, we have 



E 



al/ni + al/n2 crl/ni + a^/n2 J n|(crj/ni + cr|/ri2) 



7E{4 - 4d < 



KEY^ 



It follows that 



P |A|> 



llA!l2<i^ 



\x\VEJC^+EY^ 



< < < K^EX^ + EY^ 



U +1 



I— 1 ^ 

, E\X\^ + E\Y\^ 



Therefore, by (A.l), 

X-^i^-{Y-ii2) 



P 



s\/ni + S2/'^2 



< A'- 



- 1 



(1 + |a; - (/^i - ^l2)/^/(yl/nl+(Tl/n2\)^/n 



Similarly, 

P{T* <x)=P 
> P 



A - /ii - (y - /i2) ^ Ml - /i2 
A - /ii - (y - M2) 

\/o\lni +cr|/n2 



■\/o-^/ni + cr|/7i2 



a\lni + cr|/n2 



- 1 



and 



P 



< X- 



A-/^i-(y-A^2) 
^0-^/711 +CT|/n2 ~ ^(T^/rii +o-f/n2 

A^i - ^^2 



This proves (A.81). 



< A'- 



1 + b-l 



(1 + |x - (/ii - [i2)l\/(j\lnx +(jl/n2\)y/n 



□ 
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